Hebbian plasticity in winner-take-all (WTA) networks is highly attractive for
neuromorphic on-chip learning, owing to its efficient, local, unsupervised, and
on-line nature. Moreover, its biological plausibility may help overcome
important limitations of artificial algorithms, such as their susceptibility to
adversarial attacks, and their high demands for training-example quantity and
repetition. However, Hebbian WTA learning has found little use in machine
learning (ML), likely because it has been missing an optimization theory
compatible with deep learning (DL). Here we show rigorously that WTA networks
constructed by standard DL elements, combined with a Hebbian-like plasticity
that we derive, maintain a Bayesian generative model of the data. Importantly,
without any supervision, our algorithm, SoftHebb, minimizes cross-entropy, i.e.
a common loss function in supervised DL. We show this theoretically and in
practice. The key is a "soft" WTA where there is no absolute "hard" winner
neuron. Strikingly, in shallow-network comparisons with backpropagation (BP),
SoftHebb shows advantages beyond its Hebbian efficiency. Namely, it converges
in fewer iterations, and is significantly more robust to noise and adversarial
attacks. Notably, attacks that maximally confuse SoftHebb are also confusing to
the human eye, potentially linking human perceptual robustness, with Hebbian
WTA circuits of cortex. Finally, SoftHebb can generate synthetic objects as
interpolations of real object classes. All in all, Hebbian efficiency,
theoretical underpinning, cross-entropy-minimization, and surprising empirical
advantages, suggest that SoftHebb may inspire highly neuromorphic and radically
different, but practical and advantageous learning algorithms and hardware
accelerators.